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[57] ABSTRACT 

Interactive interfaces to video information provide a dis- 
played view of a quasi-object called a root image. The root 
image consists of a plurality of basic frames selected from 
the video information, arranged such that their respective x 
and y directions are aligned with the x and y directions in the 
root image and the z direction in the root image corresponds 
to time, such that base firames are spaced apart in the z 
direction of the root image in accordance with their time 
separation. The displayed view of the root image changes in 
accordance with a designated viewing position, as if the root 
image were a three-dimensional object. The user can 
manipulate the displayed image by designating different 
viewing positions, selecting portions of the video informa- 
tion for playback and by special effects, such as cutting open 
the quasi-object for a better view. A toolkit permits interface 
designers to design such interfaces, notably so as to control 
the types of interaction which will be possible between the 
interface and an end user. Implementations of the interfaces 
may include editors and viewers. 

41 Claims, 14 Drawing Sheets 
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INTERACTIVE VIDEO ICON WITH with text information or icons indicating how the sequence 

DESIGNATED VIEWING POSITION has been coded. The interface for interacting with the video 

database typically includes a computer input device enabling 

The present invention relates to the field of interfaces for the user to specify objects or properties of interest and, in 

video information. More particularly, the present invention 5 response to the query, the computer determines which video 

provides interactive interfaces for video information and sequences in the database correspond to the input search 

toolkits for use in creation of such interactive interfaces. terms and displays the appropriate "summaries". The user 

then indicates whether or not a particular video sequence 

BACKGROUND AND SUMMARY should be reproduced. Examples of products using this 

Video information is being produced at an ever-increasing 10 approach are described in the article "Advanced Imaging 

rate and video sequences, especially short sequences, are Product Survey: Photo, Document and Video" from the 

increasingly being used, for example, in websites and on ^^^^ "Advanced Imagmg", October 1994, which docu- 

CD-ROM, and being created, for example, by domestic use menl is mcorporated herem by this reference, 

of camcorders. There is a growing need for tools enabling In some video indexing schemes, the video sequence is 

the indexing, handling and interaction with video data. It is divided up into shorter series of frames based upon the scene 

particularly necessary for interfaces to be provided which changes or the semantic content of the video information. A 

enable a user to access video information selectively and to hierarchical structure may be defined. Index "summaries" 

interact with that information, especially in a non-sequential may be produced for the different series of frames corre- 

way. sponding to nodes in the hierarchical structure. In such a 

Cbnventionally. video information consists of a sequence case, at the time when a search is made, the "summary" 

of frames recorded at a fixed time interval. In the case of corresponding to a complete video sequence may be 

classic television signals, for example, the video information retrieved for display to the user who is then aUowed to 

consists of 25 or 30 frames per second. Each frame is request display of "summaries" relating to sub-sections of 

meaningful since it corresponds to an image which can be the video sequence which are lower down in the hierarchical 

viewed. A frame may be made up of a number of interlaced structure. If the user so wishes, a selected sequence or 

fields, but this is not obligatory as is seen from more recently sub-section is reproduced on the display monitor. Such a 

proposed video formats, such as those intended for high scheme is described in EP-A-0 555 028 which is incorpo- 

definition television. Frames describe the temporal decom- rated herein by this reference. 

position of the video image information. Each frame con- A disadvantage of such traditional, indexing/se arching 

tains image information structured in terms of lines and interfaces to video sequences is that the dynamic quality of 

pixels, which represent the spatial decomposition of the the video information is lost. 

Another approach, derived from the field of video editing. 

In the present document, the terms "video information" or consists of the "digital storyboard". The video sequence is 

"video sequences" refer to data representing a visual image segmented into scenes and one or more representative 

recorded over a given time period, without reference to the frames from each scene is selected and displayed, usually 

length of that time period or the strucnire of the recorded accompanied by text information, side-by-side with repre- 

information. Thus, the term "video sequence" will be used sentative frames torn other segments. The user now has 

to refer to any series of video frames, regardless of whether both a visual overview of all the scenes and a direct visual 

this series corresponds to a single camera shot (recorded access to individual scenes. Each representative frame of the 

between two cuts) or to a plurality of shots or scenes. storyboard can be considered to be an icon. Selection of the 

Traditionally, if a user desired to know what was the icon via a pointing device (typically a mouse-controlled 
content of a particular video sequence he was obliged to cursor) causes the associated video sequence or sub- 
watch as each frame, or a sub-sample of the frames, of the sequence to be reproduced. Typical layouts for the story- 
sequence was displayed successively in time. (For purposes 45 boards are two-dimensional arrays or long one -dimensional 
of this document, the terms "he," "him," or "his" are used for strips. In the first case, the user scans the icons from the left 
convenience in place of she/he, her/him and hers/his, and are to the right, line by line, whereas in the second case the user 
intended to be gender-neutral.) This approach is still wide- needs to move the strip across the screen, 
spread, and in applications where video data is acce^d j^-^^^^ storyboards are typicaUy created by a video editor 
usmg a personal computer the interface to the video often 50 ^-^^^ ^^e video sequence, segments the data into 
consists of a displayed window m which the video sequence individual scenes and places each scene, with a descriptive 
^contamed and a set of displayed controls similar to tho^ comment, onto the storyboard. As is well-known from 
found on a video tape recorder (aUowmg fastforward, ^^^j^.^^j literature, many steps of this process can be 
rewind, etc.). automated. For example, different techniques for automatic 

Developments in the fields of video indexing and video 55 detection of scene changes are discussed in the following 

editing have provided other forms of interface to video documents, each of which is incorporated herein by refer- 

information. ence: 

In the field of video indexing, it is necessary to code «^ Real-time neural approach to scene cut detection" by 

mformation contained in a video sequence m order to enable ^rdizzone et al, lS&T/SPLE-«torage & Retrieval for 

subsequent retrieval of the sequence from a database by 60 ^^^eo Databases IV. San Jose, Calif, 
reference to keywords or concepts. The coded content may, 

for example, identify the types of objects present in the "^^^S*'^! Segmentation" by Hampapur et al, ACM 

video sequence, their properties/motion, the type of camera Multimedia '94 Proceedings, ACM Press -1 

movements involved in the video sequence (pan. tracking "Extraction of News Articles based on Scene Cut Detec- 

shot. zoom. etc.). and other properties. A "summary** of the 65 tion using DCT Clustering" by Ariki et al. International 

coded document may be prepared, consisting of certain Conference on Image Processing. September 1996, 

representative frames taken from the sequence, together Lausanne, Switzerland; 
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"Automatic partitioning of full-motion video" by Hon- a slight oSsci. Typically the first frame of the stack is 

cJiang Zhang et al. Multimedia Systems (Springer- Verfaa. displayed in its entirety whereas underlying frames are 

199')), 1, pages 10-28-, and partially occluded by the frames in front. The envelope of 

EP-A-0 590 759 slack of frames has a parallelepiped shape. The use of a 

Various methods for automaticaUy detecUng and Iracking ' T""" °^ ^'^^^ -=7" P^^^^y Tih.tl'Sn!' 

persons and objecis in video sequen«s are co^idered in the * °f ^''^.^ 

. J , t-ru-i_*- Visual understanding. Furthermore, with some such icons, 

following documents, each of which IS incorporated herein " u^^vio ^ , , . • 

b referoice* directly access any frame represented m the 

icon. 

"Modeling Analysis and \^sualization of Nonrigid ^„ ^^^^ 

Objec^MoUon^byT.S.Huang.Proc^oflnternatjonal&^^^ "object based" X icons and video icons conuiiing a 

on Pattern Recoemtion, Vol. 1, pp 361-364, Atlantic City, , , , « l- u j" 

uu iraii^Lii ix^^^iixiiuu, , , /, representation of camera movement. In an "object based" 

[N J., June lyyu- and ^.^^ -^^^ ^ illustrated in HG. IB, objects of interest are 

"Segmentation of People in Motion" by Shio et al, Proc. isolated in the individual frames and, for al least some of the 

IEEE, vol. 79, pp 325332, 1991. Techniques for automati- 35 stacked frames, the only image information included in the 

cally detecting different types of camera shot are described ^-^^^ -^^^ ^ i^j^g^ information corresponding to the 

selected object. In such a video icon, at least some of the 

"Global zoom/pan estimation and compensation for video individual frames are represented as if they were transparent 

compression" by Tse et al, Proc. ICASSP, Vol.4, pp except in the regions containing the selected object. \^deo 

2725-2728, May 1991; and 20 icons containing an indication of camera movement may 

"Differential estimation of the global motion parameters have, as illustrated in the example of FIG. IC, a serpentine- 
zoom and pan" by M. Hoetter, Signal Processing, Vol. 16, pp shaped envelope corresponding to the case of side-to-side 
249-265, 1989. motion of the camera. 

In the case of digital storyboards too, the dynamic quality The video icons discussed above present the user with 

of the video sequence is often lost or obscured. Some information concerning the content of the whole of a video 

impression of the movement inherent in the video sequence sequence and serve as a selection tool allowing the user to 

can be preserved by selecting several frames to represent access -frames of the video sequence out of the usual order, 

each scene, preferably frames which demonstrate the move- In other words, these icons allow non^quential access to 

ment occurring in that scene. However, storyboardtype the video sequence. Nevertheless, the ways in which the user 

interfaces to video information remain awkward to use in ^° can interact with the video sequence information are strictly 

view of the fact that multiple actions on the user's part are limited. The user can select frames for playback in a 

necessary in order to view and access data. non-sequential way but he has little or no means of obtaining 

Attempts have been made to create a single visual image ^ deeper level of information concerning the video sequence 

which represents both the content of individual views mak- ^ ^ whole, short of watching a playback of the whole 

ing up a video sequence and preserves the context, that is, sequence. 

the time-varying nature of the video image infonnation. The present invention provides a novel type of interface 

One such approach creates a "trace" consisting of a single video information which aUows the user to access infor- 

frame having superimposed images taken from different "^^tion concemmg a video sequence m a highly versatile 

frames of the video sequence, these images being offset one ^ mamier. In particular mteractive video interfaces of the 

from the other due to motion occurring between the different Present mvention enable a user to obtain deeper levels of 

frames from which the images were taken. Thus, for information concemmg an associated video sequence at 

example, in the case of a video sequence representing a positions in the sequence which are designated by the user 

sprinter running, the corresponding "trace" will include b&uig of interest. 

multiple (probably overlapping) images of the sprinter, 45 The present invention provides an interface to infonnation 

spaced in the direction in which the sprinter is mnning. concerning an associated video sequence, one such interface 

Another approach of this kind generates a composite image, compnsmg: 

called a "salient still", representative of the video infonnation defining a three-dimensional root image, the 

sequence — see "Salient Video Stills: Content and Context root image consisting of a plurality of basic frames selected 

Preserved" by Teodosio et al, Proc, ACM Multimedia 93, 50 from said video sequence, and/or a plurality of portions of 

California, Aug. 1-6, 1993), pp 39-47 which article is video frames corresponding to selected objects represented 

incorporated herein by this reference in its entirety. in the video sequence, x and y directions in the root image 

StUl another approach of this general type consists in corresponding to x and y directions in the video frames and 

creation of a "video icon", as described in the papers the z direction in the root image corresponding to the lime 

"Developing Power Tools for Video Indexinor and retrieval" 55 ^"^^ whereby the basic frames are spaced apart from one 

by Zhang el al, SPIE, Vol.2185, pp 140-149-, and "Video another in the z direction of the root image by distances 

Representation tools using a unified object and perspective corresponding to the time separation between the respective 

based approach" by the present inventors, IS&T/SPIE Con- video frames; 

ference on Storage and Perusal for Image and Video means for displaying views of the root image; 

Databases, San Jose, Calif., Febniary 1995 which are incor- 60 means for designating a viewing position relative to said 

porated herein by reference. root image; and 

In a "video icon" , as illustrated in FIG. lA, the scene is means for calculating image data representing said three- 
represented by a number of frames selected from the dimensional root image viewed from the designated viewing 
sequence and which are displayed as if they were slacked up position, and for outpulting said calculated image data to the 
one behind the other in the z-direclion and are viewed in 65 displaying means. 

perspective. In other words, each individual frame is repre- According to the present invention, customized user inter- 

sented by a plane and the planes lie one behind the other with faces may be created for video sequences. These interfaces 
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comprise a di^layable "root" image which directly repre- visually some underlying motion in the video sequence, 

sents the content and context of the image information in the Thus, for example, if the video sequence corresponds to a 

video sequence and can be manipulated, either automatically travelling shot moving down a hallway and turning a comer, 

or by the user, in order to display further image information, the envelope of the set of basic frames preferably does not 

by designation of a viewing position with respect to the root 5 have a parallelepiped shape but, instead, composes a "pipe" 

image, the representation of the displayed image being of rectangular section and bending, in a way corresponding 

changed in response lo changes in the designated viewing to the camera travel during filming of the video sequence, 

position. In a preferred embodiment of the present invention, , rjuj- *f*j -.-f j-* 

{he representation of the displayed image changes dependent , P^^^^^^^ embodunents of video interfaces accordmg to 

upon the designated viewing position as if the root image P5^«=°^ ^'f video frames makmg up the 

were a three-dimensional object. In such preferred '° root image are chosen as a function of the amount of motion 

embodiments, as the designated viewing position changes, or change m the sequence. For example, in the case of a 

the dau necessary to for^ the displayed representation of ^^^^eo sequence correspondmg to a traveUmg shot, in which 

the root image is calculated so as to provide the correct background informaUon changes, it is preferable Uiat 

perspective view given the viewing augle, the distance successive basic frames should mclude back-round mfor- 

separating the viewing position from the displayed quasi- overlapping by, say, 50%. 

object and whether the viewing position is above or below In certain embodiments of the present invention, the root 

the displayed quasi-object. image corresponds to an "object-based video icon." In other 

In a reduced form, the present invention can provide words, certain ofthe basic frames included in the root image 
non-interactive interfaces to video sequences, in which the included therein m frill; only those portions corre- 
root image information is packaged with an associated script sponding to selected objects are included. Alternatively, or 
defining a routine for automatically displaying a sequence of additionally, certain basic frames may be included in fuU in 
different views of the root image and performing a set of ^^le root image but may mclude "hot objects," that is, 
manipulations on the displayed image, no user manipulation representations of objects selectable by the user. In response 
being permitted. However, the frdl benefits of the invention „ selection of such "hot objects" by the user, the corre- 
are best seen in interactive interfaces where the viewing sponding basic frames (and, if necessary, additional frames) 
position of the root image is designated by the user, as ^^^^ displayed as if they had become U-ansparent at aU 
follows. When the user first accesses the interface he is portions thereof except the portion(s) where the selected 
presented with a displayed image which represents the root object or objects are displayed. The presence of such select- 
image seen from a particular viewpoint (which may be a 3. ^^le objects in the root image allows the user to selectively 
predetermined reference viewpoint). As he designates dif- isolate objects of interest in the video sequence and obtain 
ferent viewing angles, the displayed image represents the ^t a glance a visual impression of the appearance and 
root image seen from different perspectives. When the user movement of the objects during the video sequence, 
designates viewing positions at greater or lesser distances The interfaces of the present invention allow the user to 
from the root image, the displayed image increases or 35 select an arbitrary portion of the video sequence for play- 
reduces the size and, preferably, resolution of the displayed back. The user designates a portion of the video sequence 
information, accessing image data from additional video which is of interest, by designating a corresponding portion 
frames, if need be. of the displayed image forming part of the interface to the 

The customized, interactive interfaces provided by the ^i^^eo sequence. This portion of the video sequence is than 

present invention involve displayed images, representing the 40 P^^^^^ ^^^k. The interface may include a displayed set of 

respective associated video sequences, which, in some ways, controls similar to those provided on a VCR in order to 

could be considered to be a navigable environment or a Permit the user to select different modes for this playback, 

manipulable object. This environment or object is a quasi- such as fast-forward, rewind, etc. 

three-dimensional entity. The x and y dimensions of the In preferred embodiments of interfaces according to the 

environment/object correspond to true spatial dimensions 45 invention, the displayed image forming part of the interface 

(corresponding to the x and y directions in the associated remains visible whilst the designated portion of the sequence 

video frames) whereas the z dimension of the environment/ is being played back. This can be achieved in any number of 

object corresponds lo the time axis. These interfaces could ways, as for example, by providing a second display device 

be considered to constitute a development of the "video upon which the playback takes place, or by designating a 

icons" discussed above, now rendered interactive and 50 "playback window** on the display screen, this playback 

manipulable by the user. window being offset with respect to the screen area used by 

With the interfaces provided by the present invention, the ihe interface, or by any other suitable means, 

user can select spatial and temporal information from a The preferred embodiments of interfaces according to the 

video sequence for access by designating a viewing position invention also permit the user to designate an object of 

with respect to a video icon representing the video sequence. 55 interest and to select a playback mode in which only image 

Arbitrarily chosen oblique "viewing directions" are possible information concerning that selected object is included in 

whereby the user simultaneously accesses image informa- the playback. Furthermore, the user can select a single frame 

tion corresponding to portions of a number of different from the video sequence for display separately from the 

frames in the video sequence. As the user's viewing position interactive displayed image generated by the interface, 

relative to the video icon changes, the amount of a given 60 In preferred embodiments, the interfaces of the present 

frame which is visible to him. and the number and selection invention allow the user to generate a displayed image 

of frames which he can see, changes correspondingly. corresponding to a distortion of the root image. More 

As mentioned above, the interactive video interfaces of especially, the displayed image can correspond to the root 

the present invention make use of a "root" image comprising image subjected to an "accordion effect", where the root 

a plurality of basic frames arranged to form a quasi-three 65 image is "cracked open", for example, by bending around a 

dimensional object. It is preferred that the relative placement bend line so as to "fan out" video frames in the vicinity of 

positions of the basic frames be arranged so as lo indicate the opening point, or is modified by linearly spreading apart 
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video frames al a point of interest. The accordion effect can Sinnilar possibilities exist in the case of interfaces pro- 

also be applied repetitively or otherwise in a nested fashion vided on CD-ROM. In general, the root image and other 

according to the present invention. associated information will be provided on the CD-ROM in 

The present invention can provide user interfaces to addition to the full video sequence. However, it is to be 

"multi -threaded" video sequences, that is, video sequences 5 understood that, for reasons of space saving, catalogues of 

consisting of numerous interrelated shorter segments such as video sequences could be made consisting solely of 

are found, for example, in a video game where the user's interfaces, without the corresponding full video sequences, 

choices change the scene which is displayed. Interfaces to In addition to providing the interfaces themselves, the 

such multi-threaded video sequences can include frames of present invention also provides apparat\is for creation of 

the different video segments in the root image, such that the interfaces according to the present invention. This may be 

root image has a branching structure. Alternatively, some or dedicated hardware or, more preferably, a computer system 

all of the different threads may not be visible in the root programmed in accordance with specially designed com- 

image but may become visible as a result of user manipu- puter programs. 

lation. For example, if the user expresses an intere^ in a Various of the steps involved in creation of a customized 
particular region of the video sequence by designating a interface according to the present invention can be auto- 
portion of a displayed root image using a pointing device mated. Thus, for example, the selection of basic frames for 
(such as a mouse, or by touching a touch screen, etc.) then inclusion in the "root image" of the interface can be made 
if multiple different threads of the sequence start from the automatically according to one of a number of different 
designated area, image portions for these different threads algorithms, such as choosinbg one frame every n frames, or 
may be added to the displayed image. choosing 1 frame every time the camera movement has 

In preferred embodiments of interfaces according to the 20 displaced the background by m%, etc. Similarly, the relative 
present invention, the root image for the video sequence placementpositionsof the basic frames in the root image can 
concerned is associated with information defining how the be set automatically taking into account the time separation 
corresponding displayed image will change in response to between those frames and, if desired, other factors such as 
given types of user manipulation. Thus, for example, this camera motion. Similarly, the presence of objects or people 
associated information may define how many, or which 25 in the video sequence can be detected automatically accord- 
additional frames are displayed when the user moves the ing to one of the known algorithms (such as those discussed 
viewing position closer up to the root image. Similarly, the in the references cited above), and an "object oriented" root 
associated information may identify which objects in the image can be created automatically. Thus, in some 
scene are "hot objects" and what image information will be embodiments, the interface creation apparatus of the present 
displayed in relation to these hot objects when activated by 30 invention has the capability of automatically processing 
the user. video sequence information in order to produce a root 

Furthermore, different possibUities exist for delivering the i^^g^- embodiments include means for associating 

componentsof theinterfacetotheenduser.InanappHcation Y^^h the root image a standard set of routmes for changing 

where video sequences are transmitted to a user over a '^So*'^°" displayed unage m response to user 

telecommunications path, such as via the Internet, the user 35 ™^°^P^ ^ *°^* . ^ , , . , . • . 

who is interested in a particular video sequence may first However, it is often preferable acUvely to design he 

download only certain ^mponents of thTassocialed inter- =l»«ctensUcs of interactive interfaces according to the 

f t-- r 1, L J 1 J • f r mvention, such that the ways m which the end user can 

face First of aU he downloads information for generatmg a ^^^^^^^ ^-^(^ ^-^^^ information are limited or channeled 

displayed view of the root image, together with an associ- ^^^^^^ directions. This is particularly true in the case of 

ated apphcalion program (if he does not already have an 40 video sequences which are advertisements or are used in 

appropriate "interface player" loaded m his computer). The educational software and the like. 

downloaded (or already-resident) application program j^^^ j^e present invention provides a toolkit for use in 

includes basic routmes for chancing the perspective of the creation of customized interfaces. In preferred 

displayed image in response to changes in the viewing embodiments, the toolkit enables a designer to taUor the 

position designated by the user. The application program is 45 configuration and content of the root image, as well as to 

also adapted to consult any "associated information" (as specify which objects in the video sequence are "hot 

mentioned above) which forms part of the interface and objects" and to control the way in which the displayed 

conditions the way in which the displayed image changes in interface image will change in response to manipulation by 

response to certain predetermined user manipulations (such an end user. Thus, among other things, the toolkit enables the 

as "zoom-in" and "activate object"). If the interface does not 50 interface designer to determine which frames of the video 

contain any such "associated information" then the applica- sequence should be used as basic frames in the root image, 

tion program makes use of pre-set default parameters. and how many additional frames are added to the displayed 

The root image corresponds to a particular set of basic image when the user designates a viewing position close to 

video frames and information designating relative place- the root image. 

mem positions thereof. The root image information down- 55 DESCRIFnON OF THE DRAWINGS 

loaded to the user may include just the data necessary to 

create a reference view of the root image or it may include Further features and advantages of the present invention 

the image data for the set ofbasic frames (in order to enable will become apparent from the following description of 

the changes in user viewing angle to be catered for without preferred embodiments thereof, given by way of example, 

the need to download additional information). In a case 60 and illustrated by the accompanying drawings, in which: 

where the user performs a manipulation which requires FIGS. lA-C illustrates various types of video icon, 

display of video information which is not present in the root wherein FIG. lA shows an ordinary video icon, FIG. IB 

image (e.g. he "zooms in" such that data from additional shows an object-based video icon and FIG. IC shows a 

frames is required), this extra information can either be video icon including a representation of camera motion; 

pre-packaged and supplied with the root image information 65 FIG. 2 is a block diagram indicating the components of an 

or the extra information can be downloaded from the host interactive interface according to a first embodiment of the 

website as and when it is needed. present invention; 



04/05/2004, EAST Version: 1.4.1 



5,9( 

9 

FIG. 3 is a diagram illustrating the content of the interface 
data file (FDI) used in the first embodiment of the invention; 

FIG. 4 is a diagram illustrating a reference view of a root 
image and three viewing, positions designated by a user, 

FIGS. 5A-C illustrate the displayed image in the case of 
the root image viewed from the different viewing positions 
of FIG. 4, wherein EHG. 5 A represents the displayed image 
from viewing position A, wherein FIG. SB represents the 
displayed image from viewing position B, and wherein FIG. 
5C represents the displayed image from viewing position C; 

FIGS. 6A-B illustrate displayed images based on more 
complex root images according to the present invention, in 
which FIG. 6 A is derived from a root image visually 
representing motion and FIG. 6B is derived from a root 
image visually representing a zoom effect; 

FIG. 7A-B illustrate the effect of user selection of an 
object represented in the displayed image, in a second 
embodiment of interface according to the present invention; 

FIG. 8 illustrates a user manipulation of a root image to 
produce an "according effect"; 

FIG. 9 illustrates a displayed image corresponding to a 
view of a branching root image associated with a multi- 
threaded scenario; 

FIG. 10 is a flow diagram indicating steps in a preferred 
process of designing an interface according to the present 
invention; 

FIG. 11 is a schematic representation of a preferred 
embodiment of an interface editor unit according to the 
present invention; and 

FIG. 12 is a schematic representation of a preferred 
embodiment of an interface viewer according to the present 
invention. 

DETAILED DESCRIPTION 

The components of an interactive interface according to a 
first preferred embodiment of the present invention will now 
be described with reference to FIG. 2. In this example, an 
interactive interface of the invention is associated with video 
sequences recorded on a CD-ROM. 

As shown in FIG. 2, a CD-ROM reader 1 is connected to 
a computer system including a central processor portion 2, 
a display screen 3, and a user-operable input device which, 
in this case, includes a keyboard 4 and a mouse 5. When the 
user wishes to consult video sequences recorded on a 
CD-ROM 7, he places the CD-ROM 7 in the CD-ROM 
reader and activates CD-ROM accessing software provided 
in the central processor portion 2 or an associated memory 
or unit. 

According to the first embodiment of the invention, the 
rn-ROMiia^PY^^^f^^ thgr ?QD not only the video se^ ence 
image^intorgalion 8 (in any convenient format), but also a 
respective interf ace data fi le (FDy 10 for each video 
sequence, together with a video interface application pro- 
gram 11. The content of a typical data file is illustrated in 
FIG. 3. Respective scripts 12 are optionally associated with 
the interface data files. When data on the CD-ROM is to be 
read, the video interface application program 11 is operated 
by the central processor portion 2 of the computer system 
and the interface data file applicable to the video sequence 
selected by the user is processed in order to cause an 
interactive video icon (see, for example, FIGS. 4 and 5) to 
be displayed on the display screen 3. The user can then 
manipulate the displayed icon, by making use of the mouse 
or keyboard input devices, in order to explore the selected 
video sequence. 
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The types of manipulations of the interactive video icon 
which are available to the user will now be described with 
reference to FIGS. 4 to 9. 

FIG. 4 illustrates a simple interactive video icon accord- 

5 ing to the present invention. In particular, this video icon is 
represented on the display screen as a set of superposed 
images arranged within an envelope having the shape of a 
regular parallelepiped. Each of the superposed images cor- 
responds to a video frame selected from the video sequence, 

10 but these frames are offset from one another. It may be 
considered that the displayed image corresponds to a cuboid 
viewed from a particular viewing position (above and to the 
right, in this example). This cuboid is a theoretical construct 
consisting of the set of selected video frames disposed such 

15 that their respective x and y axes correspond to the x and y 
axes of the cuboid and the z axis of the cuboid corresponds 
to the time axis. Thus, in the theoretical construct cuboid, the 
selected frames are spaced apart in the z direction in 
accordance with their respective time separations in the 

20 video sequence. 

When the user seeks to explore the video sequence via the 
interactive video icon displayed on the display screen, one 
of the basic operations be can perform is to designate a 
position on the screen as a viewing position relative to the 
displayed image (e.g. by "clicking" with the computer 
mouse). In FIG. 4, three such designated viewing positions 
are indicated by the letters A, B and C. In response to this 
operation by the user, the displayed image is changed to the 
form shown in FIG. 5: FIGS. 5 A, 5B and 5C correspond to 
"viewing positions" A, B and C, respectively, of FIG. 4. The 
image displayed to the user changes so as to provide a 
perspective view of the theoretical cuboid as seen from an 
angle corresponding to the viewing position designated by 
the user. 

The above-mentioned cuboid is a special case of a "root 
image" according to the present invention. This "root 
image" is derived from the video sequence and conveys 
information concerning both the image content of the 

^ selected sub-set of frames (called below, "basic frames") and 
the relative "position" of that image information in time as 
well as space. It is to be appreciated that the "root image" is 
defined by information in the interface data file. The defi- 
nition specifies which video frames are "basic frames" (for 
example, by storing the relevant frame nimibers), as well as 
specifying the placement positions of the basic frames 
relative to one another within the root image. 

The central processor portion 2 of the computer system 
calculates the image data required to generate the displayed 

50 image from the root image definition contained in the 
appropriate interface data file, image data of the basic frames 
(and, where required, additional frames) and the viewing 
position designated by the user, using, standard ray-tracing 
techniques. The data required to generated the displayed 

55 image is loaded into the video buffer and displayed on the 
display screen. 

According to the present invention it is preferred that, 
when the user designates a viewing position close up to the 
interactive video icon, the image information in the area of 

60 interest should be enriched. This is achieved by including, in 
the displayed image, image data relating to additional video 
frames besides the basic video frames. Such a case is 
illustrated in FIG. 5B, where the basic frames BF5 and BF6 
arc displayed together with additional frames AFl and AF2. 

65 As the user-designated viewing position approaches closer 
and closer to the displayed image the video interface appli- 
cation program causes closely spaced additional frames to 
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be added to the displayed image. Ultimately, successive 
video frames of the video sequence may be included in the 
displayed image. As is clear from FIG. 5B, image informa- 
tion corresponding to parts of the root image distant from the 
area of interest may be omitted from the displayed "close- 
up" image. 

V Preferably, the interface data file includes data specifying 
how the choice should be made of additional frames to be 
added as the user "moves close up" to the displayed image. 
More preferably, this data defi nes rules gov^ rninp the choice 
of how many, and which, additional frames should be used 
to enrich the displayed image as the designated viewing 
position changes. These rules ca n, for example, define^ 
mathematicaLc dationship bctwe& iuhc number of displayed 
frames and th cdistance separating the dea fflatcd viewing 
position^^niihe_di§pIayffi„quasi::Object7~ln preferred 
embodiments of the invention, the number of frames which 
are added to the display as the viewing position approaches 
the displayed quasi-object depends upon the amount of 
motion or change in_the_video-sequ6noe at that location. 

The example illustrated in FIG. 4 is a simplification in 
which the displayed image corresponds to a root image 
having a simple, cuboid shape. However, according to the 
present invention, the root image may have a variety of 
different forms. 

^ For example, the relative placement positions of the basic 
frames may be selected such that the envelope of the root 
image has a shape which reflects motion in the correspond- 
ing video sequence (either camera motion, duri ng tracki ng 
shots and thcJike^ or mot ion ofobj ects rep resented in the 
sequence) — see the corresponding interactive icon shown in 
FIG. 6A. Similarly, the dimensions of the basic frames in the 
root image may be scaled so as to visually represent a ^oom 
effect occurring in the video sequence -see the correspond- 
ing interactive icon shown in FIG. 6B. 

It will be seen that the interactive icon represented in FIG. 
6B includes certain frames for which only a portion of the 
image information has been displayed. This corresponds to 
a case where an object of special interest has been selected. 
Such object selection can be made in various ways. If 
desired, the root image may be designed such that, instead 
of including basic frames in full, only those portions of 
frames which represent a particular object are included. This 
involves a choice being made, at the time of design of the 
root image portion of the interface, concerning which 
objects are interesting. The designer can alternatively or 
additionally decide that the root image will include basic 
frames in full but that certain objects represented in the 
video sequence are to be "selectable" or "extractable" at user 
request. This feature will now be discussed with reference to 
FIG. 7. 

FIG. 7A illustrates an initial view presented to a user 
when he consults the interface for a particular selected video 
sequence. In this sequence two people walk towards each 
other and their paths cross. The designer of ±e interface has 
decided that the two people are objects that may be of 
interest to the end user. Accordingly, he has included, in the 
interface data file, information designating these objects as 
"extractable". This designation information may correspond 
to x, y co-ordinate range information identifying the position 
of the object in each video frame (or a subset of frames). 

If the user expresses an interest in either of the two 
objects, for example, by designating a screen position cor- 
responding to one of the objects (e.g. by "clicking" on the 
left-hand person using the right-hand mouse button), then 
the interface application program controls the displayed 



image such that extraneous portions of the displayed frames 
disappear from the display, leaving only a representation of 
the two people and their motion, as shown in FIG. 73. Thus, 
the objects of interest arc "extracted" from their surround- 

5 ings. The "missing" or transparent portions of the displayed 
frames can be restored to the displayed image at the user's 
demand (e.g. by a further "click" of the mouse button). 

It is to be understood that, according to the present 
invention, interfaces may be designed such that particular 
"exu-actable" objects may be extracted simultaneously with 
some or all of the other extractable objects, or they may be 
extracted individually. Sophisticated interfaces according to 
the present invention can incorporate object-extraction rou- 
tines permitting the user to aibitrarily select objects visible 

^ J in the displayed view of the root image, for extraction. Thtis, 
for example, the user may use a pointing device to create a 
frame around an object visible in a displayed view of the 
root image and the application program dien provides analy- 
sis routines permitting identification of the designated object 
in the other basic frames of the root image (and, if required, 
in additional frames) so as to cause display of that selected 
object as if it were located on transparent frames. 

It may be desirable to allow the user to obtain a close-up 
view of a particular portion of the interactive video icon in 

25 a manner which does not correspond to a strict perspective 
view of the re-ion concerned. Preferred embodiments of 
interface according to the invention thus provide a so-called 
"accordion" effect, as illustrated in FIG. 8. When the user 
manipulates the icon by an "accordion" effect at a particular 

3Q point, the basic frames in the vicinity of the region of interest 
are spread so as to provide the user with a better view. 
Further, preferably, the function of displaying additional 
frames so as to increase detail is inhibited diu-ing the 
"accordion" effect. 

^ In the case of "multi-threaded " video sequenc es, such as 
are traditionally found in video-based co mputer ga mes and 
edu cational software and involve parallel video subse- 
quences which are accessed alternatively depending upon 
the user's choices, these too can be the subject of interfaces 

40 according to the present invention. In such a case, the 
interface designer may choose to include frames from dif- 
ferent parallel video subsequences in the interface's root 
image in order to give the user an idea of the different plot 
strands available to him in the video sequence. FIG. 9 

45 illustrates an interactive video icon derived from a simple 
example of such a root image. 

Alternatively, or additionally, the designer may create 
secondary root images for the respective sub-sequences, 
these secondary root images being used to generate the 

50 displayed image only when the user designates a viewing 
position close to the video frame where the sub-sequence 
begins. In the case of interfaces to such computer games or 
educational software, this is a logical choice since it is at the 
point where the video sub-sequence branches from the main 

55 sequence that user choices during playing of the game, or 
using of the educational software, change the experienced 
scenario. 

Another manipulation which it is preferable to include in 
interfaces according to the invention is the traditional set of 

60 displayed VCR controls which permit the user to playback 
the video sequence with which the displayed video icon is 
associated. Furthermore, the user can select for playback 
portions or frames within the sequence by, for example, 
"clicking" with the mouse button on the frames of interest as 

65 displayed in the interactive video icon. The video playback 
can take place on a separate display screen or on a window 
defined on the display screen displaying the video icon. 
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As mentioned above, a particular video sequence may be to the present invention. These toolkits are preferably imple- 

associated with an interface data file and a script. The script meoted as a computer program for running on a general 

is a routine defined by the interface designer which leads the piupose computer. The toolkits present the designer with 

user through the use of the interface. The script can, for displayed menus and instructions to lead him through a 

example, consist of a routine to cause an automatic demon- 5 process including steps such as the typical sequence illus- 

slration of the different manipulations possible of the dis- trated in FIG 10 

played quasiK,bject. Tie user can alter .he running of the ^ g^, j^^j^,^^ ^^j^^ ^.^^^ 

scnpt in the usual way. for example by pausmg .t, slowing ^^^^^ ^/^^j^^ ,^ j^,^^^^^^ ^^^^pl^ ^,y 

It own, etc. jj.. , . typing in the name of a stored file containing the video 

'Hie script may. if desired mclude addiUonal text, sound jo sequence information. Preferably, the toolkit accesses this 

or graphic mformalion which can be reproduced in associa- ^^^^^ information for display in a window on the 

Uon with the displayed view of the root miage either consultation by the designer during the interface 

automatically or m response to operauons performed by the ^^^.^ preferred embodiments of the 

end user. Script functionality accordmg to the present mveo- ^^^^ ^^^^^^ „f basic frames/ 

uon aUows creation and editing of viewmg scenarios that 15 ^j^jecis for the root image, extracuble objects and the like by 

may be subsequently be played, m part or in whole, j„ g,^^, ,^6 video sequence and. for 

automatically, or interactively with user inputs. For example, using a mouse to place a cursor on frames or 

example, m a completely automatic mcxie, the user can ^^^^ „f ^^^^^ ^^ich are of interest. The toolkit logs the 

cause the scenario to begm to play by itself and take the user f^^^ ^^^^ ^ locations of regions in a frame, 

through the scenano and any associated mformation by 20 where appropriate) of the frames/frame portions indicated 

simply reading the scenano and changing the view In other ^y the designer and associates this positional information 

simalions the scnpt may caU for mteraction by the user, such ^^^^ appropriate parameter being defined. Preferably, at 

as to imuate a transacUon. In this case the user may be asked ^^^^ „f j^^; 

to specify information. e.g. if he wants to purchase the video presented with a displayed view of the root image for 

or any other items associated with what has been viewed. In 25 manipulation so that he may determine whether any changes 

yet other situations the editor may leave visible tags which interface data file are required, 

when activated by the user wiU cause some information to . . , ... 

be displayed on the display device; e.g. associated text, '^^^^^^ °S /''''i',"/"", . 

graphics, video, or sound files which are played through the ^"^^'^ted with the mterfa^ data file (and scnpt. if present) 

speaketsofthedisplaydevice.Incertaincasestheselagsare 30 depending upon the mterface functions which are to be 

attached to objects selected and extracted from the video supported. Thus, if no scnpt associated with the mterface 

sequence, such as so^allcd "hot objects" according to the ^ata file, the apphcatton program does not require routines 

present invention handling the runnmg of scnpts. Similarly, if the interface 

T-T^ ^/i - J- '11 . - . • 1 . • data file does not permit an accordion effect to be performed 

no. 10 IS a flow diagram Illustrating typical stages m the appUcation program do« not need 

design of an interface according to the present mvention, in 35 * • i j * i i *• i • c 

. ^ . ■ J. J to mclude routines required for calculating display mforma- 

the case where a designer is mvolved. It is to be understood ^ u <r * ir *t. - * _f • u i- *u * 

, . ^ J- . ■ 1 i_ tion for such effects. If the mterface designer believes that 

that mterf aces accordmg to the present mvention can also be , • i i i i j * u i- 

, . ,r •„ . . J .1- . the end user IS likely already to have an application program 

generated entirely automatically. It will be noted that the . i r • • , c . *l . 

^. .ui.u . . c ■ . statable for runnmg mterfaces accordmg to the present 

designer s choices affect, notably, the content of the mter- «u u u i r 

- ^ nt , ■ . . 11 . . . 11 r mvention then he may choose not to package an application 

face data file. It is to be understood, also, that not all of the 40 .t. «i. • * _r « ci i « • * %u 

•1. . . J • T-T^ -1 ■ J f program with the interface data file or else to associate with 

steps dlustrated m FIG. 10 are necessanly required-for f^.^^t^rf^^^ ^ata file merely information which identifies a 

example, steps concerning creation of secondary root . t- i- «• c - 

, . t ■ ^ r i suitable version of apphcation program for runmng this 

images can be omitted in the case of a video sequence which articular interface 

is not multithreaded. Similarly, it may be desirable to include ^ 

in the interface design process certain supplementary steps 45 '^^ P""^"^ invention has been descnbed above m con- 

which are not shown in FIG. 10. Thus, for example, it is "^^^^ ^^^^^^ sequences stored on CD-ROM. It is to be 

often desirable to include in the interface data file (as understood that the present mvention can be realized m 

indicated in the example of FIG. 3) information regarding numerous other applications. The content of the interface 

the camera motion, cuts, etc. present in the video sequence. ^^^^ elements of the interface which are present 

During use of the interface, this information can permit, for 50 l^'^^^^^" depending upon 

example, additional video frames to be added to the dis- application. 

played image and positioned so as to provide a visual For example, in an application where a video sequence is 

representation of the camera motion. During the interface provided at a web-site, the user may first download via his 

design process the information on the characteristics of the telecommunications connection just the interface data file 

video sequence can be detennined either automatically 55 applicable to the sequence. If the user does not akeady have 

(using, known cut-detection techniques and the like) and/or software suitable for handling manipulation of the interac- 

may be specified by the interface designer. It may also be live video icon then he will also download the corresponding 

desirable to include in the interface data file information appUcation program. As the user manipulates the interactive 

which aUows the sequence, or scripting for it, to be indexed video icon, any extra image information that he may require 

and retrieved. Preferably, the interface or sequence is 60 which has not akeady been downloaded can be downloaded 

accessed using such information applied according to a in a dynamic fashion as required. 

traditional method, such as standard database query Ian- This process can be audited according to the present 

guage or through a browser via a channel or network; the invention if desired. The user's interaction with the interface 

interface data may be downloaded in its entirety or fetched can be audited, and he can interact with the transaction/audit 

on an as needed basis. 65 functionality for example to supply any information required 

The present invention provides tooUdtd for use by design- by a script which may then be recorded and stored. Depend- 

ers wishing to create an interactive video interface according ing upon the apphcation, the transaction/audit information 
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can be stored and made available for externally (optional) ranking measure is derived heuristically from these mea- 

located auditing and transaction processing facilities/ sures [e.g., by normalizing the values and taking an average 

applications. In a typical situation, the auditing information of the parameters, and can be tailored for different kinds of 

can be transmitted at the end of a session whereas the sequences (traveling shots, single objects in motion, etc) or 

transaction information may be performed on-line, i.e. the 5 applications]. The editor may choose a pre-defined set of 

transaction information is submitted during the session. Real parameters from the activity measures template store (108) 

time transmission can also occur according to the present to detect or highlight a specific kind of activity (rapid 

invention, however. motion, abrupt changes, accelerations, etc.) 

Another example is the case of a catalogue on CD-ROM The frame ranking measures can be employed by the user 

including only interfaces rather than the associated video acting through the user interaction unit on the frame selec- 

sequences, in order to save space. In such a case, rather than tion unit (104) to select the frames to be included within the 

including a pointer to the image information of the basic interface. For example, if 10 frames are to be included in the 

frames of the root image, the interface data fr^me includes interface then in default mode the 10 frames corresponding 

the image infonnation. Some additional image information to the 10 largest frame making measures are selected for 

may also be provided. 15 inclusion in the interface. The user can then interactively 

The following disclosure relates to a preferred implemen- de-select some of these frames and add other frames, 

lation according to the present invention, with reference to The camera motion analysis unit (105) is an optional unit 

FIGS. 11 and 12. which typically will implement one of a number of known 

A. Interface Editor Unit techniques for measuring camera motion parameters. This 

Editors, readers and viewers according to the present ^° information can be used to determine what shape to give to 

invention can be implemented in hardware, hardware/ o^l^r envelope of the interface as shown in FIG. IC; a 

software hybrid, or as software on a dedicated platform, a default shape, stored in the interface template store (116) can 

woiicstation, a personal computer, or any other hardware. be chosen. This information may be optionally stored in the 

Different units implemented in software run on a CPU or FDI file, 

graphics boards or other conventional hardware in a con- ^ The object selection unit (106A) is responsible for select- 

veotional manner, and the various storage devices can be ing or detecting individual objects in the video document, 

general purpose computer storage devices such as magnetic There are various modes possible: in a completely manual 

disks, CD-ROMs, DVD, etc. mode the editor may visually select and outline an object of 

With reference to FIG. 11, the editor connects to a interest in a given frame through the user interaction unit 

database manager (101) and selects a video document and (120); in a semi-manual mode, the editor simply points at an 

any other documents to be included in the interface by using object and chooses from the object templates store (107) 

a daU chooser unit (102). The database manager may be features and associated algorithms to use for extracting and 

implemented in various ways; e.g., as a simple file structure tracking the chosen object; in another mode the editor may 

or even as a complete multimedia database. The data storage ^5 chose one of a set of pre-defined templates of objects and 

(100) contains the video data and any other information/ known pattern matching techniques are used to detect 

documents required and can be implemented in various whether any objects of interest are preset. The user may even 

modes; e.g., in a simple stand-alone mode of operation it assign a name/identifier to the object and add the object to 

could be a CD-ROM or in a networked application it could the object templates store (107). In this latter case searches 

be implemented as a bank of video servers. Typically the for multiple occurrences of the same object can be initiated 

user operating through the user interaction unit (120) is first by the user. The information regarding the properties of the 

presented a list of available videos or uses a standard object may be optionally stored in the FDI file, 

database query language to choose the desired video and The object extraction and tracking unit (106B) is now 

then chooses any other documents required. responsible for extracting the object of interest from the 

The creation of an interface using the editor Ls discussed 45 frame and then tracking it by using known tracking algo- 

below in three phases: (1) Analysis, (2) Visual layout and (3) rithms. The algorithms used are either chosen by the user or 

Effects creation. by default. It is understood that the object selecting, 

1. Analysis. detection, extraction, and tracking process may be highly 

The video document chosen by the editor is first pro- interactive and that the user may be called upon or choose 

cesscd by the activity measure unit (103). The activity 50 intervene in the process a number of times. The informa- 

measure unit is responsible for computing various param- ^bout the presence and location of objects may be 

eters related to the motion and changes in the video. This optionally stored in the FDI file. 

unit typically will implement one of a number of known 1° certain applications the FDI file can be made available 

techniques for measuring changes, e.g., by calculating the lo an external program, for example when the interface 

statistics of the differences between frames, by tracking 55 editor is associated with an indexing program, the task of 

objects in motion, or by estimating camera motions by which is to attach indexes (identifiers) to the video 

separating foreground and background portions of the documents, to portions thereof, or to objects located within 

image. In other implementations this unit may use motion Ihe video document, 

vector information stored in an MPEG-encoded sequence to 2. Visual Layout 

detect important frames of activity in the video document, eo The user acting through the user interaction unit (120) on 

The activity measures template store is optional but would the interface creation unit (109) determines the visual layout 

contain templates which can be used to calculate the frame of the interface. 

ranking measure and could be specified by the user through He can shape the outer envelope of the interface in any 

the user interaction unit. way that he desires; two examples are provided in FIGS. 6 

These parameters are then used to calculate a frame 65 and 9; in particular, multiple sequences can be concentrated 

ranking measure which ranks the different frames as to and so implement branching effects representing alternatives 

whether they should be included in the interface. The frame to the user. Default shapes are stored in the interface 
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template store (116). The user can also choose to vary the objects selected and extracted from the video sequence by 

spacing of the frames seen on the interface; that is the units 6A and 6B and become so-called "hot object." The 

distance between frames of the interface as perceived on the editor creates the scripts by calling up templates from the 

display imit. The user can also insert selections of the script effects templates store (115) and instantiating Ihcm by 

extracted and tracked objects from unit (106B) as illustrated 5 defining the tag and the locations of the information to be 

in FIG. 7B. In this case, the corresponding frames are called up. 

rendered transparent except at the locations of the objects. The interface effects creation unit (117) creates 4 files 
The different pieces of information generated by the units which are passed to the interface database manager (118) 
described above are gathered together by the interface which will store these files either remotely or locaUy as the 
creation unit (109) into an FDI file containing a description case may be: (1) The FDI file, completed by the special 
of the interface in terms of its layout i.e. shape and structure, effect and script tags, text and graphics which have been 
the image frame numbers and their positions, and if added to the interface and which arc directly visible to the 
available, the extracted features the ranking of the frames user. (2) The zoom effect details, scripts and special effects, 
and the camera motion information. This information is (3) The application programs (optional) to view the inter- 
transmitted to the interface effects creation imit (117). 15 face; i.e., allow the user to view the interface from different 
3 Effects Creation perspectives, traverse the interface, run the script, perform 
-me editor can also specify three classes of interface the special effects, or coded information which indicate 
features which serve to convey additional information to the ^^'^^ application program residing on the ^^^"lachme 
user and which allow the user to interact with the interface. !° P^^^^™ '^^^ operations. W The video 
TTie editor performs this specification through the interface '° sequence and any other a^ciated mformation (data) 
effects creation unit (117) required for readmg the mterface. 

-nie zooming effects creation unit (110) is used by the , ^^^f are shown stored m storage unit (119) but 

editortospecifywhichframeswiilbemadevisible,andalso f^pendrng upon the embodimem they may be physically 

which will be rendered invisible to the user when he moves „ ^^^^'^ f ^*Jf same storage device, m separate storage 

up closer to the interface (FIG. 5B) so as to view it from a ^^^^^^^ ^*^°^) ^°^^'y ^»^°^°) '"''^'^^ 

new viewing position. The choice of frames to add depends ^urmg the editmg process, the user/editor can view the 

upon factors such as, the distance of the viewing point from '"^^rface under constmction, accordmg to the current set of 

the interface, the degree of motion, the degree of scene parameters, templates and designer preferences, on^^^^ 

change, the number of frames that can be made visible and 3^ f^ce viewer unit (121) (presented in HG. 12 and described 

optionally the frame ranking measures calculated by the below), thus allowmg the editor to interactively change its 

activity measure unit (103). The editor can choose to use one appearance and features, 

or more of the default zooming effect templates contained in Interface \^ewer Unit 

the zooming effect templates store (113) and assign these in Having chosen an interface through a traditional method, 

a differential manner to different parts of the interface; 35 for example by using a database query language or by using 

alternatively the editor can choose to modify these templates a browser such as are used for viewing data on the Web, the 

and apply them differentially to the interface. interface viewer unit is then employed to read and interact 

The special effects creation unit (111) is used by the editor ^^h the interface, 
to create special visual effects on the interface. One such In a typical application the storage units (201) are 
example is the accordion effect illustrated in HG. 8 where 40 remotely located and accessed through the interface data- 
parts of the interface are compressed and other parts are base manager (202) by way of a communication channel or 
expanded. Another example is illustrated in BG. 7A and 7B network; depending upon the size and characteristics of the 
where the editor has designated an extractable object and channel and the application the interface data may be loaded 
which is then shown in its extracted form; in other words, the in its entirety or fetched on a as need basis, 
background is removed. The editor creates the scripts by 45 The data arc then stored in a local memory unit (203) 
calling up templates from the specific effects templates store which may be either a cache memory, a disk store or any 
(114) and instantiating them by defining the positions where other writable storage element. The local memory unit (203) 
the special effect is to take place and by setting the appro- stores the 4 files created by the editor (see above) and in 
priate parameters. addition a transaction/audit file. In certain cases the appli- 

^rhe script effects creation unit (113) aUows the editor of 50 cations programs are already resident in the interface viewer 

the interface to build an interface viewing scenario that may nnit and so do not need to be transmitted, 

be subsequently be played, in part or in whole. The CPU unit (204) fetches the application program, 

automatically, or interactively with user inputs. For deduces which actions need to be performed, and then 

example, in a completely automatic mode when the user fetches the relevant interface information contained in the 

calls up the interface it begins to play by itself and takes the 55 local memory unit (203). Typically the CPU imit fetches the 

user through the interface and any associated information by required application program for the user interaction unit 

simply reading the scenario and changing the view of the (205), the navigation unit (206), and the transaction/audit 

interface. In other situations the script may call for the user unit (207), then interface information is read from the local 

to interact with the interface, e.g. to initiate a transaction. In memory unit (203) passed to the interface renderer unit 

this case the user may be asked to specify information, e.g. 60 (208) which then calculates how the interface is to appear or 

if he wants to purchase the video or any other items be rendered for viewing on the display device (209). 

associated with the interface. In yet other situations the The user interacts with the interface through the user 

editor may leave visible tags which when activated by the interaction unit (205) to the navigation unit (206) and all his 

user will cause some information to be displayed on the actions are audited by the transaction/audit unit (207). In 

display device; e.g. associated text, graphics, video, or 65 addition, the user can interact with the transaction/audit unit 

sound files which are played through the speakers of the (207) for example to supply any information required by the 

display device. In certain cases these tags are attached to script which is then recorded and stored in the transaction/ 
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audit portion of the local memory unit (203). Depending 
upon the application, this transaction/audit file or a portion 
thereof is transmitted by the interface database manager to 
the appropriate storage unit (201). This information is then 
available for externally (optional) located auditing and trans- 
action processing facilities/applications. In a typical 
situation, the auditing information is transmitted at the end 
of the session whereas the transaction information may be 
performed on-line, i.e. the transaction information is sub- 
mitted during the session. 

Through the navigation imit (206) the user can choose the 
point of view from which to view the interface (or a portion 
of the interface). The interface rendered unit (208) then 
calculates how the interface is to appear or be rendered for 
viewing on the display device (209). 

If the user chooses to zoom in or zoom out, then the zoom 
effects unit (210) fetches the required application program, 
reads the zoom effect parameters stored in the local memory 
store (203), determines the frames to be dropped or added 
and supplies this information (including the additional 
frames if needed) to interface renderer unit (208) which then 
calculates how the interface is to appear or be rendered for 
viewing on the display device (209). 

If the user chooses to view part of the underlying video 
then the video play effects unit (211), fetches the required 
application program, then reads the required video data from 
the local memory unit (203) and plays the video on a second 
display device (209) or in a new window if only one display 
device is available. 

If the user chooses to interact with a hot pre-extracted 
object (created by the special effects unit), then the special 
effects unit (212), fetches the required application program, 
reads the locations of the object and the corresponding 
frames are modified so as to be transparent wherever the 
objects do not occur; the new frames are passed to interface 
renderer unit (208) which then calculates how the interface 
is to appear or be rendered for viewing on the display device 
(209). In cases where the extracted object is to be played as 
a video the frames are passed to the video effects unit (211) 
which then plays the video on a second display device (209) 
or in a new window if only one display device is available. 
Similarly if the user chooses to view an accordion effect then 
the special effects unit fetches the accordion effect store 
(203), determines the frames to be dropped or added and 
calculates parameters stored in the local memory the relative 
position of all the frames and supplies this information 
(including the additional frames if needed) to interface 
renderer unit (208) which then calculates how the interface 
is to appear or be rendered for viewing on the display device 
(209). 

If the user designates a tag created by the script then the 
script effects unit (214) fetches the required application 
program, reads the corresponding portion of the script and 
the related information required to carry out the portion of 
the script associated with the tag designated. If the interface 
is to be played in automatic mode then the script effects unit 
(214) fetches the entire script and all the related information 
required to carry out the script. When needed the zoom 
effects unit (210), the video play unit (211), and the special 
effects unit (212) may be called into play. If the script calls 
for user input such as required for carrying out a transaction, 
then a new window may be opened on the display device (or 
on a second display device) where the information is sup- 
plied and transmitted to the transaction/audit unit (207). In 
semi-automatic mode control of the viewing of the interface 
is passed between the script effects unit (214) and the 
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navigation as instructed by the user through the user inter- 
action unit (205). Although the above-discussed preferred 
embodiments of the present invention present certain com- 
binations of featiu-es, it is to be understood that the present 

5 invention is not limited to the details of these particular 
examples. Firstly, since image processing is performed on 
image data in digital form, it is to be understood that in the 
case where the video sequence consists of data in analogue 
form, an analogue -to digital converter or the like will be 

10 used in order to provide image data in a form suitable for 
processing. It is to be understood that the present invention 
can be used to create interfaces to video sequences where the 
video data is in compressed form, encrypted, etc. Secondly, 
references above to user input or user selection processes 

15 cover the use of any input device whatsoever operable by the 
user including, but not limited to, a keyboard, a mouse (or 
other pointing, device), a touch screen or panel, glove input 
devices, detectors of eye movements, voice actuated 
devices, etc. Thirdly, references above to "displays" cover 

20 the use of numerous different devices such as, but not 
limited to, conventional monitor screens, liquid crystal 
displays, etc. 

Furthermore, for ease of comprehension the above dis- 
cussion describes interfaces according to the present inven- 
ts tion in which the respective root images each have a single 
characteristic feature, such as, giving a visual representation 
of motion, or giving a visual representation of zoom, or 
having a multi-threaded structure, etc. It is to be understood 
that a single root image can combine several of these 
30 features, as desired. Similarly, special effects such as object 
extraction, the accordion effect, etc. have been described 
separately. Again, it is to be understood that interfaces 
according to the invention can be designed to permit any 
desired combination of special effects. 
35 What is claimed is; 

1. An interface to an associated video sequence, the 
interface comprising: 

a) information defining a three-dimensional root image, 
the root image consisling of a plurality of basic frames 
selected from said video sequence, and/or a plurality of 
portions of video frames corresponding to selected 
objects represented in the video sequence, x and y 
directions in the root image corresponding, to x and y 
directions in the video frames and the z direction in the 
root image corresponding to the time axis whereby the 
basic frames are spaced apart from one another in the 
z direction of the root image by distances correspond- 
ing to the time separation between the respective video 
frames; 

b) means for displaying views of the root image; 

c) means for designating a viewing position relative to 
said root image; and 

d) means for calculating image data representing said 
55 three-dimensional root image viewed from the desig- 
nated viewing, position, and for outputting said calcu- 
lated image data to the displaying means. 

2. An interactive interface according to claim 1, wherein 
the designating means is user-operable means for designat- 

60 ing a viewing position relative to a displayed representation 
of the root image. 

3. An interactive interface according to claim 1 wherein 
the means for calculating image data for display is adapted 
to include in the calculated output image data, dependent 

65 upon the designated viewing position, image data corre- 
sponding to portions of basic frames which are not visible in 
the reference view of the root image. 
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4. An interface according to claim 1 wherein the means image information of the video sequence and for designating 
for calculating image data for display is adapted to include objects as selectable by an end user. 

in the calculated image data, dependent upon the distance 13. i^paratus according to claim 11 wherein the means 

between the designated viewing position and the root image, for setting the relative placement positions of the basic 

image data from frames of the video sequence additional to 5 frames in the root image comprises means for accessing 

the basic frames. stored information representing a plurality of templates and 

5. An interactive interface according to claim 4 wherein means for inputting selection information designating one of 
the means for calculating image data for display is adapted the stored templates. 

to select for use in calculating the image for display addi- 14. Apparatus according to claim 11 wherein the means 

tional frames chosen based upon criteria specified in addi- for setting the relative placement positions of the basic 

tional information stored in association with the root image frames in the root image comprises means for detecting 

definition. motion in the video sequence and means for placing the 

6. An interface according to claim 1, wherein the means basic frames within the root image in relative positions 
for calculating image data for display is adapted to calculate which provide a visual representation of said motion, 
output image data corresponding to a different number of 15. Apparatus according to claim 14 wherein the means 
frames and/or a displayed image of enlarged or reduced size, ^ ^ pjadng the basic frames within the root image is adapted 
dependent upon the distance between the user-designated effect a progressive change in the dimensions of the basic 
viewing position and the root image. ... frames in the root image in order to visuaUy represent a 

7. An interface according to any previous claim, wherein ^oom-in or zoom-out operation. 

the video sequence includes image data representing one or Apparatus according to claim 11, wherein the means 

more selected objects, the means for calculating image data 20 ^ ^ ^^^^^^ ^^^^^ 

for display being adapted, for each displayed frame oontam- ^ adapted to select 

mg a respective selected object, selectively to output unage "7 ^ ^^^^ « w 

daTa causing display of only that image data which corre- ^f"^ ^ ^ ,^^^^0° '^^^ back-round 

sponds to the selected objecl(s), causing, the remainder of ^formation m the image. 

the respective displayed frame to appear transparent. ^5 ^PP^^^ according to claim U. and comprising 

8. An interface according to claim 7, wherein there is means for inputting parameters conslrammg the ways m 
provided means for the user to select objects represented in which the displayed view of the root image can be changed 
the displayed image, and wherein the means for calculating depending upon a user-designated viewing position, the 
image data for display is adapted to output image data constraint parameters being assimilated into the routines 
causing portions of frames to appear transparent in response 30 associated with the root image by the associating means, 
to the selection of objects by the user. 18. /*4)paratus according to claim 17 wherein the con- 

9. An interface according to any one of claims 1-5 or 8 for straint parameter inputting means is adapted to input data 
a video sequence comprising a main sequence of video identifying the rate at which additional frames should be 
frames and at least one additional subsequence of video included in a displayed view of the root image when a 
frames constituting an alternative path to or from a particular 35 user-designated viewing position approaches the root image, 
video frame in the main sequence, wherein the user can 1^- Apparatus according to claim 11, and comprising 
access image information relating to an alternative sub- means for creating secondary root images corresponding to 
sequence by designating a viewing position close to a point additional sub-sequences of video frames constituting alter- 
in the root image corresponding to said particular video native paths to or from a particular video frame in the main 
frame, the means for calculating image data for display video sequence. 

being adapted to graft on to the displayed view of the root 20. A process for creating an interface corresponding to a 

image, at the branching point, a secondary root image predetermined video sequence, comprising the steps of: 

representing said alternative sub-sequence. a) retrieving the video sequence from a data store; 

10. An interactive interface according to claim 9, wherein b) analyzing data corresponding to at least some frames of 
by operation of the viewing position designating means the 45 the video sequence based upon at least one predeter- 
user can navigate through root images and secondary root mined algorithm; 

images corresponding to the different possible scenarios c) selecting at least some frames from the video sequence 
contained in the video sequence. based at least in part on frame ranking measure para ra- 
il. Apparatus for creation of an interface to a video eters stored in a frame ranking template store; 
sequence, the apparatus comprising: 50 d) arranging the selected frames to form a succession of 

a) means for accessing image information in digital form frames defining at least in part the interface; and 
representing a video sequence; e) transferring data corresponding to said selected and 

b) means for creating a root image representing the video arranged frames to an interface store. 

sequence, the root image creation means comprising: 21. A process according to claim 20 in which said step of 

i) means for selecting a sub-set of frames from the 55 selection is conducted automatically. 

video sequence, or portions of said sub-set which 22. A process according to claim 20 in which said step of 

correspond to objects represented in the video selection is conducted at least in part manually, 

sequence, to serve as basic frames of the root image; 23. A process according to claim 20 in which said step of 

and selection is based at least in part on the degree of motion of 

ii) means for setting the relative placement positions of 60 objects in the frames. 

the basic frames in the root image, and 24. A process according to claim 20 in which said step of 

c) means for associating with the root image routines for selection is based at least in part on estimating camera 
changing the displayed view of the root image depend- motion by separating foreground and background portions 
ing upon a designated viewing position relative to the of the images. 

root image. 65 25. A process according to claim 20 in which said step of 

12. Apparatus according to claim 11 and further compris- selection is based at least in part on vector data in the digital 

ing means for identification of objects represented in the representation of images in the frames. 



04/05/2004, EAST Version: 1.4.1 



5,963,; 

23 

26. A process according to claim 20 further comprising 
the steps of: 

a) analyzing data corresponding to at least some frames of 
the video sequence in order to evaluate objects within 
said frames; ^ 

b) selecting at least one object from a plurality of the 
frames; 

c) selecting at least some frames from the video sequence 
based at least in part on said at least one object; 

d) tracking said at least one object through the selected 
frames; and 

e) arranging the selected frames based at least in part on 
said at least one object. 

27. A process according to claim 26 in which said step of 15 
selecting said at least one object is conducted automatically. 

28. A process according to claim 26 in which said step of 
selecting said at least one object is conducted at least in part 
manually. 

29. A process according to claim 20 further comprising 20 
the steps of: 

a) arranging the selected frames to form a succession of 
frames defining at least in part the interface; 

b) selecting at least one additional frame to add to the 
succession of frames corresponding to a new viewing 
position based at least in part on certain predetermined 
factors; and 

c) selecting at least one frame to remove from the suc- 
cession of frames corresponding to a new viewing 
position based at least in part on certain predetermined 
factors. 

30. A process according to claim 29 in which said step of 
selecting at least one additional frame to add is conducted 
automatically. 35 

31. A process according to claim 29 in which said step of 
selecting at least one additional frame to add is conducted at 
least in part manually. 

32. A process according to claim 29 in which said step of 
selecting at least one additional frame to remove is con- 
ducted automatically. 

33. A process according to claim 29 in which said step of 
selecting at least one additional frame to remove is con- 
ducted at least in part manually, 

34. A process according to claim 20 further comprising 
the steps of: 

a) ananging the selected frames based at least in part on 
user specified criteria; and 

b) calculating the arrangement of the selected frames 
based at least in part on predetermined algorithms. so 

35. A process according to claim 20 further comprising 
the steps of: 

a) creating an interface data file which contains data 
corresponding at least in part to said interface; and 

b) storing said interface data file in a data store, 

36. A process according to claim 20 comprising the 
further steps of: 
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a) creating a eflfects detail file which contains data corre- 
sponding to the said selection and arrangement of the 
selected frames; and 

b) storing said effects detail file in a data store. 

37. A process according to claim 20 comprising the 
further steps of: 

a) creating a video sequence file which contains data 
corre^onding to the selected frames; and 

b) storing said video sequence file in a data store. 

38. A process according to claim 20 comprising the 
further steps of: 

a) extracting a predetermined set of information from the 
said interface; 

b) creating a script file consisting at least in part of said 
predetermined set of information; and 

c) storing the said script file in a data store. 

39. A method for processing an interface corresponding to 
a predetermined video sequence, comprising the steps of: 

a) retrieving the video sequence from a data store; 

b) analyzing data corresponding to at least some frames of 
the video sequence based upon at least one predeter- 
mined algorithm; 

c) selecting at least some frames from the video sequence 
based at least in part on frame ranking measure param- 
eters stored in a frame ranking template store; 

d) arranging the selected frames to form a succession of 
frames defining at least in part the interface; and 

e) transferring data capable of generating at least one 
image corresponding to said succession of frames to a 
viewer; 

f) generating an image from a desired perspective using 
said data capable of generating at least one image; and 

g) displaying said image on a display device. 

40. A method according to claim 39 further comprising 
the steps of: 

a) generating an image from a second desired perspective 
using said data capable of generating at least one 
image; and 

b) displaying said image on a display device. 

41. A method according to claim 39 in which the step of 
generating an image from a desired perspective using said 
data capable of generating at least one image comprises the 
steps of: 

a) determining frames in the interface to dropped or 
added; 

b) calculating the position of aU frames relative to each 
other; and 

c) generating an image that renders said frames positioned 
appropriately relative to each other and that takes into 
account the predetermined perspective. 

♦ * « * « 
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